Characterisation of the blood RNA host response underpinning severity in COVID-19 patients

Infection with SARS-CoV-2 has highly variable clinical manifestations, ranging from asymptomatic infection through to life-threatening disease. Host whole blood transcriptomics can offer unique insights into the biological processes underpinning infection and disease, as well as severity. We performed whole blood RNA Sequencing of individuals with varying degrees of COVID-19 severity. We used differential expression analysis and pathway enrichment analysis to explore how the blood transcriptome differs between individuals with mild, moderate, and severe COVID-19, performing pairwise comparisons between groups. Increasing COVID-19 severity was characterised by an abundance of inflammatory immune response genes and pathways, including many related to neutrophils and macrophages, in addition to an upregulation of immunoglobulin genes. In this study, for the first time, we show how immunomodulatory treatments commonly administered to COVID-19 patients greatly alter the transcriptome. Our insights into COVID-19 severity reveal the role of immune dysregulation in the progression to severe disease and highlight the need for further research exploring the interplay between SARS-CoV-2 and the inflammatory immune response.

www.nature.com/scientificreports/ the main risk factors for developing severe COVID-19 3,4 . Dysfunctional immune responses to SARS-CoV-2, specifically impaired type I interferon responses in conjunction with an exacerbated inflammatory response, have been associated with progression to severe COVID-19 [5][6][7] . An imbalanced immune response leads to the development of a 'cytokine storm' which causes lung inflammation, septic shock, and multi-organ failure 5,7,8 . Identification of improved treatments and prevention of severe COVID-19 requires better understanding of the underlying immune and inflammatory processes that distinguish severe disease from mild illness. Host whole blood transcriptomic profiling of patients with infectious and inflammatory conditions has been extensively used for understanding infectious disease dynamics, from the identification of accurate biomarkers of infection [9][10][11] to gaining insights into variations in the host response to different pathogens and severity of disease 12,13 . Targeted 6,14 and untargeted [15][16][17][18] transcriptomic profiling of whole blood from SARS-CoV-2-positive hosts has already been undertaken. To the best of our knowledge, severity of COVID- 19 has not yet been explored in whole blood with non-hospitalised SARS-CoV-2-positive cases included as a comparator group, nor has it been explored across a range of severities. Furthermore, the impacts of severity on the transcriptome have not yet been explored in an approach that accounts for the treatment regiments received by the patients. The study of transcriptomic profiles from individuals with varying severities will be an essential tool for improving our understanding of the course of disease resulting from SARS-CoV-2 infection.
We have analysed the whole blood transcriptomes obtained from individuals with different levels of COVID-19 severity to explore how the host blood transcriptome changes with increasing COVID-19 severity, aiming to identify the key biological processes and genes underpinning differences in severity.
Subjects granted informed consent for their participation in the study. If this was not possible at the time of sampling, deferred consent was allowed, and subjects were approached for consent at the earliest appropriate opportunity. Subjects who did not agree to participate in the study were excluded. GEN-COVID Study was approved by the Ethics Committee of Galicia by fast-track procedure on 18 th March 2020 (CEIC Galicia, reg 2020/178).
Patients with COVID-19 were categorised as having mild, moderate, or severe disease. Mild patients were those who were always outpatients; emergency department attendance was the ceiling of care, and they were not admitted to hospital (WHO score 1-2). Moderate patients were those admitted to hospital, for whom ward-based therapy was the ceiling of care with supportive care limited to oxygen delivery; no intensive care unit (ICU) admission (WHO score [3][4]. Severe patients were those who were admitted to ICU at any time throughout the course of their disease (WHO score 5-7). Supportive care included high flow oxygen (> 16 L/ minute), noninvasive ventilation (NIV), invasive ventilation, inotropic support, renal replacement therapy, and extracorporeal membrane oxygenation (ECMO). The severe category also included those patients who died (WHO score 8), in the emergency department, ward or ICU. Research blood samples were taken following admission to hospital for moderate and severe COVID-19 patients. For mild patients, patients were recruited via telephone following a positive SARS-CoV-2 test, and visited at home by the research team to obtain a research blood sample and consent.
RNA isolation and quantification. Whole blood was collected at the time of recruitment into PAXgene blood RNA tubes (PreAnalytiX), frozen, and total RNA (including RNA > 18 nucleotides) was isolated according to the manufacturer's instructions (Qiagen). RNA samples were stored at − 80 °C, before undergoing an additional DNAse treatment using an RNA clean & concentrator kit (Zymo Research) prior to sequencing at The Wellcome Centre for Human Genetics in Oxford, UK. Material was quantified using RiboGreen (Invitrogen) on the FLUOstar OPTIMA plate reader (BMG Labtech) and the size profile and integrity analysed on the 2200 TapeStation (Agilent, RNA ScreenTape). Input material was normalised and strand specific library preparation was completed using NEBNext® Ultra™ II mRNA kit (NEB) and NEB rRNA/globin depletion probes following manufacturer's instructions. Libraries were on a Tetrad (Bio-Rad) using in-house unique dual indexing primers (based on 19 ). Individual libraries were normalised using Qubit and pooled together. The pooled library was diluted to ~ 10 nM for storage and denatured and further diluted prior to loading on the sequencer. Paired end sequencing was performed using a Novaseq6000 platform at 150 paired end configuration. The RNA-Seq analysis pipeline consisted of quality control using FastQC 20 , MultiQC 21 and annotations modified with BEDTools 22 , alignment and read counting using STAR 23 , SAMtools 24 , FeatureCounts 25 and version 89 ensembl GCh38 genome and annotation 26 . Statistical analysis. All statistical analyses were performed using the statistical software R (R version 4.0.3) 27 . Normalised counts were calculated for each gene using DESeq2 (V1.30.0) 28 and default parameters were used. Normalised genes with fewer than three samples with a normalised read count of at least 20 were considered lowly expressed and were removed, leaving a total of 20,536 genes. Principal component analysis (PCA) was performed on the normalised counts.
Immune cell level measurements were not available for all individuals included in the analysis. Therefore, cell-type fractions were estimated from the bulk host transcriptome data using the CIBERSORTx algorithm 29 . The estimated fractions were compared across the three COVID-19 patient groups (mild, moderate, and severe) and statistical significance was evaluated using the Kruskal-Wallis test followed by the pairwise Dunn's test with www.nature.com/scientificreports/ p-values adjusted using the Benjamini-Hochberg (BH) correction (moderate vs. mild; severe vs. mild; severe vs. moderate). Adjusted p-values < 0.05 were considered significant. Immune cell proportions were explored in relation to age and sex. Pairwise Mann-Whitney U tests were performed contrasting males vs. females within each severity group and generalised linear models (GLMs) were performed testing the relationship between age and immune cell proportions with each severity group, with p-values adjusted using the BH adjustment. DESeq2 28 was used for differential expression analysis (further information in Supplementary Methods). Prior to exploring the impact of COVID-19 severity on the transcriptome, the effect of immunomodulatory treatment on the transcriptome was assessed through differential expression analysis using DESeq2 28 . Patients with moderate COVID-19 who were not receiving steroids at the time of sampling were contrasted against patients with moderate COVID-19 who were receiving steroids at the time of sampling, whilst accounting for age and sex in the model. Steroids were chosen as the immunomodulatory treatment to explore due to adequate sample size in comparison to other immunomodulatory treatment groups such as tocilizumab. Equally, moderate COVID-19 samples were chosen due to the group size. Patients receiving tocilizumab or interferon therapies were excluded from this comparison. Treatments including antibiotics, antivirals and antimalarials were allowed.
To determine whether administration of antivirals influenced the host transcriptome, samples from moderate COVID-19 patients who received antivirals were contrasted against samples from moderate COVID-19 patients who did not receive antivirals using DESeq2 28 with a model accounting for age and sex.
DESeq2 28 was used for differential expression analysis of COVID-19 severity groups. We assessed transcriptomic differences with increasing COVID-19 severity by carrying out pairwise comparisons between each severity group (i.e., moderate vs. mild; severe vs. mild; and severe vs. moderate). We used two different model designs for each comparison. First, the models included immunomodulatory treatment status, sex, age, and severity. This model design aimed to account for transcriptomic differences induced by immunomodulatory treatments by including variables representing whether the patients received tocilizumab, steroids, or interferon treatment, in addition to sex, age and severity. The second model design included in silico immune cell proportion estimates, sex, age, and severity. This design accounted for transcriptomic differences induced by different proportions of immune cells, and it included the immune cell fractions described below as well as sex, age, and severity. Adjusted p-values were calculated using the BH procedure 30 . The log 2 fold-changes (LFC) and adjusted p-values of all genes were visualised using volcano plots. Concordance and discordance resulting from different model designs were visualised using cross plots. Genes with an adjusted p-value < 0.05 were considered significantly differentially expressed (SDE). The lists of SDE genes were subjected to pathway analysis using Ingenuity Pathway Analysis (IPA; QIAGEN Inc., https:// www. qiage nbioi nform atics. com/ produ cts/ ingen uity-pathw ayanaly sis). IPA was selected because it can predict directionality of pathways through knowledge of molecular functions, and it returns a z-score for each predicted pathway. Positive and negative z-scores indicate that the pathway is upregulated or downregulated, respectively, in the group of interest compared against the reference group. Z-scores are calculated using the log2 fold change values obtained by the differential expression analysis with higher absolute z-scores representing a larger degree of change (further details in Supplementary Methods).
Severity was also explored as an additive variable (mild = 0; moderate = 1; severe = 2), and by contrasting samples from hospitalised COVID-19 patients (moderate and severe) to samples from non-hospitalised COVID-19 patients (mild) using DESeq2, with full methods described in the Supplementary Methods.

Results
A schematic showing an overview of the patients, analysis, and key findings is shown in Fig. 1.
Clinical description of patients. Whole blood transcriptomic profiling through RNA Sequencing (RNA-Seq) was performed on 65 samples from patients recruited through the GEN-COVID study. Samples from patients in whom a pathogen in addition to SARS-CoV-2 was isolated less than 5 days before or 10 days after the research blood sample were not selected (n = 10), as the aim was to identify gene expression changes in blood that reflect COVID-19 disease and not coinfections. Of the 55 remaining COVID-19 patients, 19, 26 and 10 patients had mild, moderate, and severe disease, respectively. In all but two cases, the severity categorisation of the patient matched their level of supportive care at the time the research blood sample. For these two patients, the decision to transfer them to ICU was made within 36 h of the sample extraction, therefore they were classified as severe in our analyses.
The research blood sample was taken at home for all mild COVID-19 patients since they were non-hospitalised. Following a positive SARS-CoV-2 test, mild patients were contacted via telephone and visited by the study team at home where the research blood sample was taken. For moderate and severe COVID-19 cases, the research blood sample was taken following admission to hospital at a median of 5 days (IQR: 4-7) and 7 days (IQR: 2.8-8.8) following admission for moderate and severe COVID-19, respectively ( Table 1).

Investigations
White cells (

Maximum level of care/outcome
Ambulatory 19 (100%) 0 (0%) 0 (0%) Admitted to ward 0 (0%) 26 (100%) 0 (0%) Admitted to ICU 0 (0%) 0 (0%) 10 (100%) Died 0 (0%) 0 (0%) 0 (0%) Table 2. WHO Severity Classification Score for the COVID-19 patients included in the analysis. This is the patients' final severity score rather than their severity at the time of sampling (as shown in Table 1).  (Table 1), including immunomodulatory therapies. To assess the impact of immunomodulatory therapies on the transcriptome of individuals with COVID-19, we identified genes SDE between patients with moderate COVID-19 who did not receive steroids (n = 19) and patients with moderate COVID-19 who did receive steroids, excluding those who also received monoclonal antibody therapy (n = 6). Individuals who received steroids in combination with antibiotics, antivirals and antimalarials were included in the comparisons (Supplementary Table S2). 556 genes were SDE in patients who received steroids vs. those who did not (BH p-value < 0.05), of which 253 genes were overexpressed with steroid administration and 303 were under-expressed (Supplementary File 1). No significant pathways were identified by IPA. 84.6% (n = 22) of the moderate COVID-19 patients were administered antivirals (lopinavir-ritonavir). The transcriptome profiles of moderate COVID-19 patients who received antivirals were contrasted against the transcriptome profiles of moderate COVID-19 patients who did not receive antivirals (n = 24) to determine whether antiviral administration influenced the transcriptome. Using a DESeq2 model accounting for age and sex, only one SDE gene was identified as SDE between moderate COVID-19 patients who did and did not receive antivirals (lopinavir-ritonavir). The gene was DND1P1 (BH p-value: 0.040; LFC: 0.989). DND1P1 was not significant in any downstream analyses.

Differential gene expression analysis of COVID-19 severity. Pairwise comparisons were made
between the three severity groups (mild, moderate and severe) to identify transcriptomic differences with increasing severity. In the moderate vs. mild COVID-19 comparison, there was greater concordance between the models accounting for immunomodulatory treatment and immune cell proportions as the number of genes identified as SDE in both models was higher than for the severe vs. mild and the severe vs. moderate comparisons (Fig. 2).
Severe COVID-19 vs. mild COVID-19. 7343 genes were SDE between severe (n = 10) and mild COVID-19 (n = 19) whilst accounting for immunomodulatory treatment with 3329 and 4014 genes over-and underexpressed with increasing severity, respectively (Supplementary File 3). 94 genes were SDE between severe and mild COVID-19 whilst accounting for immune cell proportions with 81 and 13 genes over-and under-expressed with increasing severity, respectively. 87 genes were SDE between severe and mild COVID-19 irrespective of the model design (Fig. 2B).

Discussion
We have explored host whole-blood transcriptomes from COVID-19 patients with varying degrees of severity through differential expression and pathway analysis. We made pairwise comparisons between three different severity groups: mild, moderate, and severe. Severity analyses revealed major upregulation of genes and pathways related to the inflammatory immune response with increasing severity, with notable increases in genes and pathways related to neutrophil-and macrophage-mediated immunity accompanied by decreases in pathways related to T cell-mediated immunity. We observed considerable transcriptomic differences between moderate COVID-19 individuals who received steroids to those who did not receive steroids, suggesting that this immunomodulatory treatment has a profound impact on the transcriptome. The widespread use of steroids in COVID-19 and the transcriptional disruption we observed in patients receiving steroids support the inclusion of immunomodulatory treatment in models related to COVID-19 transcriptomic analyses, in order to account for their impacts on the transcriptome.
Immune dysregulation has been extensively discussed as a contributing factor in the progression to severe COVID-19 5,8,[36][37][38] . Infection with SARS-CoV-2 has been shown to induce a lower interferon response and an enhanced pro-inflammatory cytokine response in comparison to other viruses 39 . This pro-inflammatory cytokine response leads to the attraction of monocytes and neutrophils, the development of a cytokine storm and hyperinflammation, and is likely to contribute to COVID-19 severity 36,39 . Elevated levels of inflammatory cytokines and chemokines have been identified in plasma and serum from patients with increasing COVID-19 severity 37,40,41 . We observed a domination of immune system associated pathways upregulated with increasing severity, with inflammatory immune pathways consistently identified.
Comparison of the in silico estimates of immune cell proportions between the three severity groups revealed significant differences in all immune cell types except B cells. Lymphopenia was identified early in the pandemic as a key feature of COVID-19 severity 42,43 . Our results reflect this as the levels of CD4 and CD8 T cells and NK cells reduce with increasing severity, whilst pathway analysis contrasting severe COVID-19 to either moderate or mild COVID-19 also revealed downregulation of many T cell-related pathways.
Aschenbrenner et al. 18 observed neutrophil-specific gene expression changes with increasing COVID-19 severity in whole blood. We found that the proportions of in silico neutrophil estimates increased with severity. Furthermore, we identified various pathways related to neutrophils that increased with severity, such as TREM1 signalling, Inhibition of Matrix Metalloproteases and the STAT3 pathway. TREM1 has been associated with neutrophil migration across airway epithelial cells and has been suggested to increase inflammation through neutrophil migration into the lung 44 . Matrix Metalloproteases (MMPs) are activated by neutrophil elastase and increased levels of multiple MMPs have been associated with increased COVID-19 severity 45,46 .
To add to observations of upregulated neutrophil-related pathways and increased in silico neutrophil estimates with severity, Fig. 3A shows a clear cluster of neutrophil-associated genes with high absolute LFC values that were SDE with additive severity. When the blood transcriptomic profiles of patients with mild COVID-19 were compared with the profiles from patients with moderate or severe COVID-19, CEACAM8, MMP8, ELANE, LTF, CEACAM6 and MPO were consistently amongst the top SDE genes, with levels increasing with severity and high LFCs. MMP8 was also found to have an additive effect with severity. ELANE, MPO and PRTN3, which are linked to neutrophil degranulation and NETosis, have been found to be significantly altered in naso-oropharyngeal samples of SARS-CoV-2 infected patients 47 . ELANE, LTF, CEACAM8 and MMP8 have been identified as being expressed in developing neutrophils, a novel cell subtype that was discovered through single-cell RNA sequencing of hospitalised COVID-19 patients, specifically identified in patients with acute respiratory distress syndrome (ARDS) 48,49 . CEACAM6 has been identified as having high expression in Type II pneumocytes in COVID-19 patients, the cells targeted by SARS-CoV-2 50 , and it has been suggested that cross-talk between Type II pneumocytes and developing neutrophils in COVID-19 occurs via CEACAM8-CEACAM6 49 . It is possible that this cross-talk may promote differentiation of developing neutrophils, leading to further COVID-19 progression 49 .
Pathways related to macrophage activation were identified with increasing COVID-19 severity. Increased levels of macrophage inflammatory protein 1α and monocyte-derived FCN1 + macrophage cells have been detected in individuals with more severe COVID-19 51,52 . The observations of increased inflammatory immune pathways and increased neutrophil and macrophage activity were accompanied with downregulation of T cell related pathways with increasing severity, including upregulation and downregulation of Th2 and Th1 pathways, respectively, with increasing severity. These findings suggest that the balance between different immune cell types is a key component that influences severity of COVID-19.
When immune cell proportions were included in the pairwise severity models, there was a considerable reduction in the number of SDE genes identified (Fig. 2), with only one gene identified as SDE between severe and moderate COVID-19. This observation indicates that much of the transcriptomic differences between individuals with varying severity of COVID-19 are driven by different proportions of immune cells. Furthermore, a large proportion of the pathways identified as enriched with severity were related to functions of different immune cells, highlighting that they play a major role in the pathogenesis of severe COVID-19. www.nature.com/scientificreports/ Other pathways of interest include pathways related to hypoxia that were enriched with increasing severity. Hypoxia is a primary feature and major cause of mortality amongst patients with severe COVID-19 53 . Various pathways related to protein production and DNA/protein damage were identified as downregulated with increasing COVID-19 severity. SARS-CoV-2 has been shown to cause major disruption to host protein production, for example viral protein NSP1 has been shown to bind to the host 40S ribosomal subunit resulting in mRNA translation shut down 54,55 . Furthermore, coronaviruses have also been shown to use DNA damage to induce cell cycle arrest 56 . Damaged DNA can lead to accumulation of nuclear DNA in the cytoplasm which triggers innate immune responses 57 .
Neutrophilia and neutrophil activation are usually markers of bacterial infection and are uncommon in most uncomplicated viral infections. In view of the association between increasing severity of COVID-19 with increased neutrophil counts, and expression of neutrophil genes including those involved in degranulation, NETosis, and neutrophil-mediated tissue injury, a key question is what mechanisms are responsible for the shift from the normal "viral" response in mild COVID-19, to the severe inflammatory process involving neutrophils in severe disease. The timing of the inflammatory phase of COVID-19 that usually occurs in the second week of illness 58,59 , together with the increased expression of immunoglobulin genes that we observed (e.g., IGKV1D-13, IGHV3-43, IGLV4-3, IGLV3-16, IGLV3-10) may suggest that immunoglobulin, directed at either viral or modified self-antigens, may be involved in neutrophil and macrophage activation through Fc-gamma receptors or complement mediated activation, genes related to which are upregulated in severe disease (e.g., FCGR2A, FCGR3B, FCGR3A, CR1, C3AR1, C5AR1). Neutrophil activation and neutrophil-mediated tissue injury may be a promising target for therapeutic interventions.
This study has identified many genes and pathways that are associated with differing COVID-19 severity, amongst which there could be some promising novel targets for immunomodulatory therapies for preventing severe COVID-19. As such, the genes and pathways identified here warrant further investigation.
Limitations. COVID-19 severity is highly influenced by age and sex at birth which leads to major confounding between COVID-19 and these two variables. Although we controlled for these confounders (by including sex, age, and the interaction between age and severity) when exploring transcriptomic changes with severity, it is possible that we may have (a) failed to identify key drivers of severity as they are confounded with age or sex, (b) inadvertently included spurious genes that are really driven by age or sex rather than severity. Furthermore, patients included in this study had various comorbidities, with the frequency of some comorbidities higher in severe patients (obesity, smoking, endocrine conditions). These comorbidities could influence gene expression changes, introducing another confounder into the analysis. Due to lack of detailed documentation for all subjects for these comorbidities, it was not possible to correct for the variance introduced as was done with age and sex. The sample sizes in our analyses are modest for some severity groups. For example, in the severe COVID-19 group, only 10 samples could be included as the rest were excluded due to concomitant bacterial infection. These samples were excluded because coinfections were likely to have had profound transcriptional impacts and may have masked the genuine SARS-CoV-2 signal.
In addition, we were not able to correct for all types of immunomodulatory treatment. Specifically, macrolides which are known immunomodulatory antibiotics, were administered to all moderate and severe COVID-19 patients and to none of the mild COVID-19 patients ( Table 1). As a result, we were unable to include macrolides in our models accounting for immunomodulatory treatment since the model would have been in full rank, i.e., impossible to disentangle the severity and macrolides.

Conclusion.
We have explored the transcriptomic impact of SARS-CoV-2 infection through evaluating the transcriptomic differences between individuals with varying levels of COVID-19 severity. We have observed considerable transcriptomic perturbation which offers insights into the host factors that influence development of severe COVID-19. Upregulation of inflammatory immune pathways was observed with increasing severity, with multiple neutrophil, macrophage and immunoglobulin-associated genes and pathways identified, suggesting that increased COVID-19 severity may be mediated in part by neutrophil activation, which may be related to production of immunoglobulin as acquired immunity develops. Furthermore, we have discovered that administration of steroids leads to profound changes in the whole blood transcriptome of individuals with similar COVID-19 severity, highlighting the importance of considering the effects of treatment in future COVID-19 transcriptomic studies.

Data availability
Gene counts and sample metadata are available on ArrayExpress under the accession E-MTAB-10926. Code used in the analyses can be found at: https:// github. com/ PIDBG/ sarsc ov2_ trans cript omics/ tree/ main/ sever ity_ analy sis.